hierarchical clustering
Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search
Hierarchical clustering is a data analysis method that has been used for decades. Despite its widespread use, the method has an underdeveloped analytical foundation. Having a well understood foundation would both support the currently used methods and help guide future improvements. The goal of this paper is to give an analytic framework to better understand observations seen in practice. This paper considers the dual of a problem framework for hierarchical clustering introduced by Dasgupta.
Hierarchical Clustering Beyond the Worst-Case
Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn
Finally, we report empirical evaluation on synthetic and real-world data showing that our proposed SVD-based method does indeed achieve a better cost than other widely-used heurstics and also results in a better classification accuracy when the underlying problem was that of multi-class classification.
- Asia > Afghanistan > Parwan Province > Charikar (0.05)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (3 more...)
- Asia > Afghanistan > Parwan Province > Charikar (0.05)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
GPT-HTree: A Decision Tree Framework Integrating Hierarchical Clustering and Large Language Models for Explainable Classification
Pei, Te, Alican, Fuat, Yin, Aaron Ontoyin, Ihlamur, Yigit
Decision trees are fundamental tools in machine learning (ML), prized for their interpretability and simplicity in classification tasks. By providing clear decision paths, they enable users to understand and trust the reasoning behind predictions. However, their effectiveness diminishes when applied to heterogeneous datasets comprising entities with varying characteristics. Uniform decision paths often fail to account for the nuanced differences among diverse segments, leading to oversimplified or misleading classifications. Unsupervised clustering methods, on the other hand, excel in discovering latent structures within complex datasets. These methods, including hierarchical clustering, k-means, and DBSCAN, are powerful tools for segmenting populations into meaningful clusters without requiring predefined labels. While they are effective for uncovering hidden patterns, their primary drawback is a lack of explainability. Clusters produced by unsupervised methods often lack intuitive descriptions or actionable insights, making it difficult to interpret their relevance or apply them in practical decision-making scenarios.
- North America > United States > New York (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology (0.68)
- Banking & Finance > Capital Markets (0.48)
Reviews: Flattening a Hierarchical Clustering through Active Learning
This paper derives complexity results for active learning queries to hierarchical clustering. The result is a partition or "cut", c, of the cluster tree, where the "flat" clustering is defined by the clusters at the leaves of a subtree of nodes AB(c) that have the same root as the original cluster tree. Learning occurs by making pairwise judgments on items (leaf nodes). All pairwise judgments form a "ground truth" matrix \Sigma. Given consistency conditions, this is an equivalent way to represent a clustering.
Reviews: Hierarchical Clustering via Spreading Metrics
The paper of Dasgupta introduced a nice objective for hierarchical clustering, and presented a fairly simple algorithm for approximating the objective. I like this objective since it gives a nice principled way of comparing different algorithms (potentially new ones) for hierarchical clustering, as opposed to just showing bounds for linkage-based heuristics. I think the algorithm and analysis in this paper is very elegant. Every hierarchical clustering corresponds to an ultrametric; but they characterize the ultrametrics that correspond to the objective defined by Dasgupta, using some very nice properties. They use this characterization to come up with an LP and its rounding.
Enhancing Indoor Mobility with Connected Sensor Nodes: A Real-Time, Delay-Aware Cooperative Perception Approach
Ning, Minghao, Cui, Yaodong, Yang, Yufeng, Huang, Shucheng, Liu, Zhenan, Alghooneh, Ahmad Reza, Hashemi, Ehsan, Khajepour, Amir
This paper presents a novel real-time, delay-aware cooperative perception system designed for intelligent mobility platforms operating in dynamic indoor environments. The system contains a network of multi-modal sensor nodes and a central node that collectively provide perception services to mobility platforms. The proposed Hierarchical Clustering Considering the Scanning Pattern and Ground Contacting Feature based Lidar Camera Fusion improve intra-node perception for crowded environment. The system also features delay-aware global perception to synchronize and aggregate data across nodes. To validate our approach, we introduced the Indoor Pedestrian Tracking dataset, compiled from data captured by two indoor sensor nodes. Our experiments, compared to baselines, demonstrate significant improvements in detection accuracy and robustness against delays. The dataset is available in the repository: https://github.com/NingMingHao/MVSLab-IndoorCooperativePerception
- North America > Canada > Alberta (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering
Similarity-based Hierarchical Clustering (HC) is a classical unsupervised machine learning algorithm that has traditionally been solved with heuristic algorithms like Average-Linkage. Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree. In this work, we provide the first continuous relaxation of Dasgupta's discrete optimization problem with provable quality guarantees. The key idea of our method, HypHC, is showing a direct correspondence from discrete trees to continuous representations (via the hyperbolic embeddings of their leaf nodes) and back (via a decoding algorithm that maps leaf embeddings to a dendrogram), allowing us to search the space of discrete binary trees with continuous optimization. Building on analogies between trees and hyperbolic space, we derive a continuous analogue for the notion of lowest common ancestor, which leads to a continuous relaxation of Dasgupta's discrete objective.